---
title: "iptmnetr_use_case"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

## Introduction
In this markdown, we re-analyze data from a published study on response of lung cancer cells to the tyrosine kinase inhibitor, erlotinib (PMID: 25404012). Erlotinib is used as a therapeutic agent in lung cancer patients who carry mutations in the epidermal growth factor receptor (EGFR). Patients initially respond well to the drug, but inevitably develop resistance. We focus on 243 phosphorylation sites in 194 proteins that were significantly upregulated by treatment with the EGFR ligand, epidermal growth factor (EGF), and downregulated by erlotinib. These sites are likely to be targets of EGFR-regulated pathways that are inhibited by drug treatment. We retrieve kinases for these sites from iPTMnet using iptmnetr and then compute some basic statistics on the results.

## Retrieving Kinase Information
In this part, we retrieve kinases from iPTMnet for the EGFR/erlotinib-regulated sites using iptmnetr, and write the table of kinase-site relationships to a file. The sites are listed in the file egfr_sites_formatted.txt. The input file has three tab-delmited columns: UniProtAC of the phosphorylated protein, amino acid residue of the phosphorylated site, and position of the phosphorylated site (e.g., P12345 S 100). 
```{r kinase_retrival}
setwd("~/Documents/Bioinformatics/PIR/EGFR_phosphoproteomics/phosphoproteomics_workflow")
library("iptmnetr", lib.loc="/Library/Frameworks/R.framework/Versions/3.4/Resources/library")
set_host_url("http://34.233.45.77")
kinase_info <- get_ptm_enzymes_from_file("egfr_sites_formatted.txt")
write.table(kinase_info, file="egfr_kinases.txt", sep = '\t', quote = FALSE, col.names = NA)
head(kinase_info)
```

## Basic Statistics

Next, we compute:

• Number of kinase-site pairs

• Number of sites with at least one kinase

• Number of kinases

• Number of sites per kinase

• Number of kinases that phosphorylate three or more sites


```{r kinase_stats}
#Find number of kinase-site pairs
num_kinase_site_pairs <- nrow(kinase_info)
num_kinase_site_pairs

#Find number of sites with at least one kinase
kinase_info$full_site <- paste(kinase_info$sub_id, kinase_info$site, sep = " ")
dup_sites <- duplicated(kinase_info$full_site)
num_unique_sites <- nrow(kinase_info[!dup_sites,])
num_unique_sites

#Find number of unique kinases
num_unique_kinases <- nrow(kinase_info[!duplicated(kinase_info$enz_id),])
num_unique_kinases

#Find number of sites per kinase
library("plyr", lib.loc="/Library/Frameworks/R.framework/Versions/3.4/Resources/library")
kinase_tally <- count(kinase_info, "enz_name")
kinase_tally_sorted <- kinase_tally[order(-kinase_tally$freq),] 
kinase_tally_sorted

#Find number of kinases that phosphorylate three or more sites
high_freq_kinases <- kinase_tally_sorted[ which(kinase_tally_sorted$freq >= 3),] 
num_high_freq_kinases <- nrow(high_freq_kinases)
num_high_freq_kinases
```

